Starting at $0.44 billed per second, 1 minute minimum

Overview

What is AWS Glue?

AWS Glue is a managed extract, transform, and load (ETL) service designed to make it easy for customers to prepare and load data for analytics. With it, users can create and run an ETL job in the AWS Management Console.…

Recent Reviews

Software developer

7 out of 10

October 23, 2023

The main concern in AWS Glue is to so much costing of Glue jobs and I was worked with 5000 dataset I was facing some performance issue as …

AWS Glue ETL tool

8 out of 10

July 25, 2023

Incentivized

We use AWS Glue to creat Etl pipelines for transforming and moving of data from different data sources like S3, snowflakes, postgres to …

Great for ETL and batch processing

8 out of 10

September 20, 2022

Incentivized

1) In my current use case we mainly use AWS Glue for Extract Transform Load to process batch data on daily basis. 2) the main problem we …

AWS Glue is a good data catalog and integration service

9 out of 10

September 14, 2022

Incentivized

We heavily rely on AWS Glue for cataloging our data objects (tables and views). We use AWS Glue as our Data Catalog and use it in our data …

Unmatchable serverless computing.

9 out of 10

September 13, 2022

Incentivized

The automation of numerous tasks, including logging, alerting, monitoring, etc., is made possible by AWS Glue. Additionally, it is …

AWS Glue - The managed ETL service for your data

9 out of 10

July 03, 2022

Incentivized

We use AWS Glue for ETL of the healthcare data. The input data come from different source systems and so with different formats. With help …

AWS Glue : a fully managed ETL service

9 out of 10

January 13, 2020

Incentivized

One of the straightforward and quick cloud-based ETL tools is AWS Glue. It comes under the umbrella of AWS services. We use AWS Glue to …

Read all reviews

Reviewer Pros & Cons

View all pros & cons

Data transformation
Complexity transformation

Ashutosh Mishra

Engineer in Information Technology

Curitics health (Health, Wellness and Fitness, 201-500 employees)

Scaling of memory resources
Real time data triggers

Verified User

Team Lead in Information Technology

Computer Software Company, 5001-10,000 employees

Return to navigation

Pricing

View all pricing

per DPU-Hour

$0.44

Cloud

billed per second, 1 minute minimum

Entry-level set up fee?

No setup fee

Offerings

Free Trial
Free/Freemium Version
Premium Consulting/Integration Services

Return to navigation

Product Details

About
Tech Details

What is AWS Glue?

AWS Glue Technical Details

Deployment Types	Software as a Service (SaaS), Cloud, or Web-Based
Operating Systems	Unspecified
Mobile Application	No

Return to navigation

Comparisons

View all alternatives

Compare with

Reviews and Ratings

(31)

Attribute Ratings

7
Support Rating1 rating

Reviews

(1-4 of 4)

Sort By *

Companies can't remove reviews or game the system. Here's why

July 25, 2023

AWS Glue ETL tool

Verified User

Team Lead in Information Technology

Computer Software Company, 5001-10,000 employees

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We use AWS Glue to creat Etl pipelines for transforming and moving of data from different data sources like S3, snowflakes, postgres to Redshift and vice versa. Execution of spark jobs is really easy as it has auto generated code which establishes connections with source and target data bases securedly and helps in the cleansing of data like deduplication and performing validations on data. As it is Serverless it will automatically scale up and scale down the memory resources required to run the spark glue job.

Pros and Cons

Execution of spark jobs
Scaling of memory resources
Crawling the schemas

Incremental data sync
Real time data triggers
Grouping of small files

Likelihood to Recommend

ETL operations and jobs are well suited to perform with glue. If we want to transform or extract data from the data sources specially in the data stored in the AWS cloud . It is very well integrated with the other AWS services. It is easier to establish connections. We can schedule the crawlers or run on demand.

Most Important Features

Extract
Transform
Load
Spark jobs
Crawling
Connections

Return on Investment

Easy to use
Lesser training required
Detailed documentation available hence implementation is easy

Alternatives Considered

Talend Open Studio

AWS Glue is easier to use and has more and better features compared to it. And more documentation and tutorials and labs are widely available on the internet about AWS Glue which in turn helps in easier implementation of the spark jobs. Auto scaling is an added advantage. It's Serverless features will make it stand out from others.

Other Software Used

Talend Open Studio, Informatica Data Prep, IBM Netezza Performance Server

September 20, 2022

Great for ETL and batch processing

Verified User

Engineer in Information Technology

Information Technology & Services Company, 501-1000 employees

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

1) In my current use case we mainly use AWS Glue for Extract Transform Load to process batch data on daily basis. 2) the main problem we can able to solve or we can say the solution which Glue provides that is, it can easily integrate with other AWS services like S3, RDS, Athena 3) pricing model is also very like pay-as-you-go 4) the main business problem which glue solve we can ingest the data ewe can perform ETL on top that and can create spark or python shell jobs

Pros and Cons

Extract , transform , Load
AWS Data catalog
triggers
we can create workflows

In-Stream schema registries feature people can not use this more efficiently
in Connections feature they can add more connectors as well
The crucial problem with AWS Glue is that it only works with AWS.

Likelihood to Recommend

well suited:- when you want to use it to transform your data then glue also provides there own transformation also in that option you can able to do PII masking of Data if you don't want to use any code approach. The second scenario would be when want to integrate glue with other AWS services. and also wants to run Spark on glue for faster processing. less appropriate:- If you want to integrate with other services which are outside of AWS. it does not support Java as of now so if you have java resources then you can not run it.

Most Important Features

AWS Glue studio
crawler
notebook
triggers
workflows

Return on Investment

Positive Impact :- after ETL we can able to do some kind of automation
Negative :- At some point of time it can hamper the cost but not really

Alternatives Considered

Talend Open Studio

The main reason we choose AWS Glue over Talend Open Studio 1) Does not support Spark 2) Run only on java 3) not really feasible solution for heavy workloads 4) most of the cases need customer support 5) no proper documentation is available

Other Software Used

Amazon Elastic Compute Cloud (EC2), AWS Database Migration Service, Snowflake

July 03, 2022

AWS Glue - The managed ETL service for your data

Apurv Doshi

Practice Head - Labs (Innovation and R&D)

Infostretch (Information Technology and Services, 5001-10,000 employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We use AWS Glue for ETL of the healthcare data. The input data come from different source systems and so with different formats. With help of the AWS Glue jobs, we translate the data into a common format. With help of python scripts and the scheduled job feature, the data is fetched in a periodic manner, processed with help of the python script, converted to the parquet format, and stored in the S3 bucket. The glue catalog generates the schema of the stored data and allows AWS Athena to query the same for analytics purposes.

Pros and Cons

It is extremely fast, easy, and self-intuitive. Though it is a suite of services, it requires pretty less time to get control over it.
As it is a managed service, one need not take care of a lot of underlying details. The identification of data schema, code generation, customization, and orchestration of the different job components allows the developers to focus on the core business problem without worrying about infrastructure issues.
It is a pay-as-you-go service. So, there is no need to provide any capacity in advance. So, it makes scheduling much easier.

The sample code should cover more scenarios. They are quite basic. However, you can find good pointers from the internet and AWS community and tickets.
AWS Glue runs on Apache Spark. So, to take the best of the AWS Glue service, the developer should have a good idea of Apache Spark.

Likelihood to Recommend

When the data which requires ETL has different formats, schema, and volume, this service suits them best. So, when the volume is not consistent (typical use-case of healthcare and online shopping), AWS Glue can be the prime choice. When the data is available in both batch and streaming mode, the developer needs to generate a separate codebase. This increases the source code management efforts. So, prefer to go with Glue when the nature of the data is the same (either batched or streamed).

Most Important Features

AWS Glue Data catalog to write the efficient queries.
AWS Glue Crawler for the automatic schema recognition.
AWS Glue schedule job to perform certain ETL tasks on the defined interval.

Return on Investment

We were transforming the data using a simple python script and were facing a lot of orchestration issues. The failure of the script was quite prominent as the nature of the data was a bit more dynamic. With help of AWS glue, we could fix ~80% of orchestration issues. With help of automatic schema generation, dynamism is also addressed very well. So, we have started realising the ROI from day 1.

Alternatives Considered

AWS Data Pipeline

Glue comes in form of a managed service. However, the AWS Data Pipeline puts additional responsibility to manage the infrastructure. We were not requiring fine-grained control of the hardware which the AWS Data Pipeline provides. We also want to park our data on DynamoDB. AWS Glue allows storing the data to DynamoDB but the same is not possible with the AWS Data Pipeline. So, we decided to move ahead with AWS Glue.

Other Software Used

Amazon SageMaker, Alexa, Amazon Lex

January 13, 2020

AWS Glue : a fully managed ETL service

Verified User

Employee in Research & Development

Computer Software Company, 501-1000 employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

One of the straightforward and quick cloud-based ETL tools is AWS Glue. It comes under the umbrella of AWS services. We use AWS Glue to analyze an extensive data set of USA based clinics and hospitals. Its HIPAA compliance for sensitive data. It comes with the support of python script, Schedular, and works very well with other AWS services like s3, rds.

Pros and Cons

Very quick for ETL job.
UI as well Command Interface with very few steps to create and schedule ETL Job.

Sample Code is very basic and not available in most of the scenario.

Likelihood to Recommend

AWS glue is best if your organization is dealing with large and sensitive data like medical record. Its comes with scheduler and easy deployment for AWS user. The data catalog keeps the reference of the data in a well-structured format. If you are already part of the AWS services, then AWS Glue is the best choice; otherwise, it's not a simple one for deployment.

Return on Investment

Helps to leverage the data in realtime.
Leads to quick and clear business decision making.

Alternatives Considered

Azure Data Catalog, Azure Data Factory, DigitalOcean and Google Cloud Dataflow

We are already in AWS services, so AWS glue is the first choice for us. But for the comparison of ETL job making and process time, it's way faster for other services.

Support Rating

Amazon responds in good time once the ticket has been generated but needs to generate tickets frequent because very few sample codes are available, and it's not cover all the scenarios.